15 research outputs found

    Analysis of recombination in molecular sequence data

    Get PDF
    We present the new and fast method Recco for analyzing a multiple alignment regarding recombination. Recco is based on a dynamic program that explains one sequence in the alignment with the other sequences using mutation and recombination. The dynamic program allows for an intuitive visualization of the optimal solution and also introduces a parameter α controlling the number of recombinations in the solution. Recco performs a parametric analysis regarding α and orders all pareto-optimal solutions by increasing number of recombinations. α is also directly related to the Savings value, a quantitative and intuitive measure for the preference of recombination in the solution. The Savings value and the solutions have a simple interpretation regarding the ancestry of the sequences in the alignment and it is usually easy to understand the output of the method. The distribution of the Savings value for non-recombining alignments is estimated by processing column permutations of the alignment and p-values are provided for recombination in the alignment, in a sequence and at a breakpoint position. Recco also uses the p-values to suggest a single solution, or recombinant structure, for the explained sequence. Recco is validated on a large set of simulated alignments and has a recombination detection performance superior to all current methods. The analysis of real alignments confirmed that Recco is among the best methods for recombination analysis and further supported that Recco is very intuitive compared to other methods.Wir prĂ€sentieren Recco, eine neue und schnelle Methode zur Analyse von Rekombinationen in multiplen Alignments. Recco basiert auf einem dynamischen Programm, welches eine Sequenz im Alignment durch die anderen Sequenzen im Alignment rekonstruiert, wobei die Operatoren Mutation und Rekombination erlaubt sind. Das dynamische Programm ermöglicht eine intuitive Visualisierung der optimalen Lösung und besitzt einen Parameter α, welcher die Anzahl der Rekombinationsereignisse in der optimalen Lösung steuert. Recco fĂŒhrt eine parametrische Analyse bezĂŒglich des Parameters α durch, so dass alle pareto-optimalen Lösungen nach der Anzahl ihrer Rekombinationsereignisse sortiert werden können. α steht auch direkt in Beziehung mit dem sogenannten Savings-Wert, der die Neigung zum EinfĂŒgen von Rekombinationsereignissen in die optimale Lösung quantitativ und intuitiv bemisst. Der Savings-Wert und die optimalen Lösungen haben eine einfache Interpretation bezĂŒglich der Historie der Sequenzen im Alignment, so dass es in der Regel leicht fĂ€llt, die Ausgabe von Recco zu verstehen. Recco schĂ€tzt die Verteilung des Savings-Werts fĂŒr Alignments ohne Rekombinationen durch einen Permutationstest, der auf Spaltenpermutationen basiert. Dieses Verfahren resultiert in p-Werten fĂŒr Rekombination im Alignment, in einer Sequenz und an jeder Position im Alignment. Basierend auf diesen p-Werten schlĂ€gt Recco eine optimale Lösung vor, als SchĂ€tzer fĂŒr die rekombinante Struktur der erklĂ€rten Sequenz. Recco wurde auf einem großen Datensatz simulierter Alignments getestet und erzielte auf diesem Datensatz eine bessere VorhersagegĂŒte in Bezug auf das Erkennen von Alignments mit Rekombination als alle anderen aktuellen Verfahren. Die Analyse von realen DatensĂ€tzen bestĂ€tigte, dass Recco zu den besten Methoden fĂŒr die Rekombinationsanalyse gehört und im Vergleich zu anderen Methoden oft leichter verstĂ€ndliche Resultate liefert

    Expression pattern analysis of transcribed HERV sequences is complicated by ex vivo recombination

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The human genome comprises numerous human endogenous retroviruses (HERVs) that formed millions of years ago in ancestral species. A number of loci of the HERV-K(HML-2) family are evolutionarily much younger. A recent study suggested an infectious HERV-K(HML-2) variant in humans and other primates. Isolating such a variant from human individuals would be a significant finding for human biology.</p> <p>Results</p> <p>When investigating expression patterns of specific HML-2 proviruses we encountered HERV-K(HML-2) cDNA sequences without proviral homologues in the human genome, named HERV-KX, that could very well support recently suggested infectious HML-2 variants. However, detailed sequence analysis, using the software RECCO, suggested that HERV-KX sequences were produced by recombination, possibly arising <it>ex vivo</it>, between transcripts from different HML-2 proviral loci.</p> <p>Conclusion</p> <p>As RT-PCR probably will be instrumental for isolating an infectious HERV-K(HML-2) variant, generation of "new" HERV-K(HML-2) sequences by <it>ex vivo </it>recombination seems inevitable. Further complicated by an unknown amount of allelic sequence variation in HERV-K(HML-2) proviruses, newly identified HERV-K(HML-2) variants should be interpreted very cautiously.</p

    An Extended Set of Haar-Like Features for Rapid Object Detection

    Get PDF
    Recently Viola et al. [5] have introduced a rapid object detection scheme based on a boosted cascade of simple features. In this paper we introduce a novel set of rotated haar-like features, which significantly enrich this basic set of simple haar-like features and which can also be calculated very efficiently. At a given hit rate our sample face detector shows off on average a 10 % lower false alarm rate by means of using these additional rotated features. We also present a novel post optimization procedure for a given boosted cascade improving on average the false alarm rate further by 12.5%. Using both enhancements the number of false detections is only 24 at a hit rate of 82.3 % on the CMU face set [7].

    Recco: Recombination analysis using cost optimization

    No full text
    Motivation: Recombination plays an important role in the evolution of many pathogens, such as HIV or malaria. Despite substantial prior work, there is still a pressing need for efficient and effective methods of detecting recombination and analyzing recombinant sequences. Results: We introduce Recco, a novel fast method that, given a multiple sequence alignment scores the cost of obtaining one of the sequences from the others by mutation and recombination. The algorithm comes with an illustrative visualization tool for locating recombination breakpoints. We analyze the sequence alignment with respect to all choices of the parameter α weighting recombination cost against mutation cost. The analysis of the resulting cost curve yields additional information as to which sequence might be recombinant. On random genealogies Recco is comparable in its power of detecting recombination to the algorithm Geneconv (Sawyer 1989). For specific relevant recombination scenarios Recco significantly outperforms Geneconv

    Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data

    No full text
    We present a statistical approach to scoring changes in activity of metabolic pathways from gene expression data. The method identifies the biologically relevant pathways with corresponding statistical significance. Based on gene expression data alone, only local structures of genetic networks can be recovered. Instead of inferring such a network, we propose a hypothesis-based approach. We use given knowledge about biological networks to improve sensitivity and interpretability of findings from microarray experiments.Recently introduced methods test if members of predefined gene sets are enriched in a list of top-ranked genes in a microarray study. We improve this approach by defining scores that depend on all members of the gene set and that also take pairwise co-regulation of these genes into account. We calculate the significance of co-regulation of gene sets with a nonparametric permutation test. On two data sets the method is validated and its biological relevance is discussed. It turns out that useful measures for co-regulation of genes in a pathway can be identified adaptively.We refine our method in two aspects specific to pathways. First, to overcome the ambiguity of enzyme-to-gene mappings for a fixed pathway, we introduce algorithms for selecting the best fitting gene for a specific enzyme in a specific condition. In selected cases, functional assignment of genes to pathways is feasible. Second, the sensitivity of detecting relevant pathways is improved by integrating information about pathway topology. The distance of two enzymes is measured by the number of reactions needed to connect them, and enzyme pairs with a smaller distance receive a higher weight in the score calculation.

    Ab Initio Loop Modeling with Precalculated Synthetic Loops and Sidechain Placement

    Full text link
    corecore